phenotype definition
sEHR-CE: Language modelling of structured EHR data for efficient and generalizable patient cohort expansion
Munoz-Farre, Anna, Rose, Harry, Cakiroglu, Sera Aylin
Electronic health records (EHR) offer unprecedented opportunities for in-depth clinical phenotyping and prediction of clinical outcomes. Combining multiple data sources is crucial to generate a complete picture of disease prevalence, incidence and trajectories. The standard approach to combining clinical data involves collating clinical terms across different terminology systems using curated maps, which are often inaccurate and/or incomplete. Here, we propose sEHR-CE, a novel framework based on transformers to enable integrated phenotyping and analyses of heterogeneous clinical datasets without relying on these mappings. We unify clinical terminologies using textual descriptors of concepts, and represent individuals' EHR as sections of text. We then fine-tune pre-trained language models to predict disease phenotypes more accurately than non-text and single terminology approaches. We validate our approach using primary and secondary care data from the UK Biobank, a large-scale research study. Finally, we illustrate in a type 2 diabetes use case how sEHR-CE identifies individuals without diagnosis that share clinical characteristics with patients.
- Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)
- Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
- Health & Medicine > Therapeutic Area > Oncology (0.94)
- Health & Medicine > Health Care Technology > Medical Record (0.88)
Assessing Phenotype Definitions for Algorithmic Fairness
Sun, Tony Y., Bhave, Shreyas, Altosaar, Jaan, Elhadad, Noémie
Disease identification is a core, routine activity in observational health research. Cohorts impact downstream analyses, such as how a condition is characterized, how patient risk is defined, and what treatments are studied. It is thus critical to ensure that selected cohorts are representative of all patients, independently of their demographics or social determinants of health. While there are multiple potential sources of bias when constructing phenotype definitions which may affect their fairness, it is not standard in the field of phenotyping to consider the impact of different definitions across subgroups of patients. In this paper, we propose a set of best practices to assess the fairness of phenotype definitions. We leverage established fairness metrics commonly used in predictive models and relate them to commonly used epidemiological cohort description metrics. We describe an empirical study for Crohn's disease and diabetes type 2, each with multiple phenotype definitions taken from the literature across two sets of patient subgroups (gender and race). We show that the different phenotype definitions exhibit widely varying and disparate performance according to the different fairness metrics and subgroups. We hope that the proposed best practices can help in constructing fair and inclusive phenotype definitions.
- North America > United States > New York > New York County > New York City (0.04)
- North America > Canada > Ontario (0.04)
- Europe > Germany (0.04)
- Research Report > New Finding (0.68)
- Research Report > Experimental Study (0.46)
- Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
- Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.97)